Back

Statistics in Medicine

17 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials
2026-03-06 epidemiology 10.64898/2026.03.05.26347653
#1 (3.5%)
Show abstract

Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...

2
Federated penalized piecewise exponential model for horizontally distributed survival data: FedPPEM
2026-02-12 health informatics 10.64898/2026.02.11.26346054
Top 0.1% (1.9%)
Show abstract

Cox proportional hazard regressions are frequently employed to develop prognostic models for time-to-event data, considering both patient-specific and disease-specific characteristics. In high-dimensional clinical modeling, these biological features can exhibit high collinearity due to inter-feature relationships, potentially causing instability and numerical issues during estimation without regularization. For rare diseases such as acute myeloid leukemia (AML), the sparsity and scarcity of data...

3
Using Negative Control Outcomes to Detect Selection Bias in Mendelian Randomization Studies
2026-02-01 epidemiology 10.64898/2026.01.30.26345215
Top 0.2% (1.5%)
Show abstract

Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically ...

4
Correcting for effect modification in the doubly-ranked non-linear Mendelian randomization method
2026-01-23 epidemiology 10.64898/2026.01.22.26344640
Top 0.2% (1.5%)
Show abstract

The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome...

5
Aging Out of the Blue: Estimating and Calibrating Region-specific Epigenetic Clocks for a Blue Zone via SuperLearner
2026-03-03 epidemiology 10.64898/2026.03.02.26346901
Top 0.3% (1.2%)
Show abstract

Epigenetic clocks estimate biological age from DNA methylation patterns at CpG sites, providing robust predictions of mortality and morbidity risk. "Blue zones"--regions of exceptional longevity--offer a unique opportunity to investigate how biological aging diverges from chronological age. However, standard clocks are typically trained on large, heterogeneous datasets, reflecting average population trends rather than region-specific dynamics. Using data from the Costa Rican Longevity and Health...

6
Assessing the Role of Model Complexity in Virtual Clinical Trial Outcomes
2025-12-27 pharmacology and therapeutics 10.64898/2025.12.22.25342808
Top 0.3% (1.1%)
Show abstract

Virtual clinical trials (VCTs) hold significant promise for improving the drug development process, yet their predictive reliability depends critically on design decisions that remain poorly understood. This study examines how model complexity influences VCT outcomes, as well as how the choice of prior parameter distributions and virtual patient inclusion criteria affects those outcomes. Using oncolytic virotherapy treatment of murine tumors as a case study, we compared three mathematical models...

7
Physiology-Informed Conditional Variational Autoencoder for Generating Pediatric Virtual Patients
2026-01-24 pharmacology and therapeutics 10.64898/2026.01.21.26344442
Top 0.3% (1.0%)
Show abstract

Reliable pediatric virtual patients are essential for model-informed simulations, including physiologically based pharmacokinetic (PBPK) modeling, to support dose selections in children and to evaluate drug exposure across developmental stages. Despite the availability of extensive pediatric physiological data and age- or size-based models, there remains a lack of well-established, flexible, and scalable approaches for integrating these data into realistic pediatric virtual patients that preserv...

8
Novel Representations of Vaccine Protection Against Progression to Severe Disease Over Time
2026-02-14 epidemiology 10.64898/2026.02.12.26346197
Top 0.4% (0.9%)
Show abstract

BackgroundVaccines can prevent severe disease by preventing infection or by reducing progression among those who become infected. Vaccine effectiveness against progression given infection is often used to quantify this second mechanism, but it conditions on infection, which is itself affected by vaccination. As a result, this estimand lacks a clear causal interpretation and may behave non-intuitively over time. MethodsWe introduce a conceptual framework that models protection against infection ...

9
An equilibrium solution to the elective waiting list problem
2025-12-29 health policy 10.64898/2025.12.29.25343140
Top 0.4% (0.8%)
Show abstract

In many countries, demand exceeds supply for elective (non-emergency) hospital treatment, such as hip replacements and cataract removals. The consequence of this is the formation of a waiting list, to which patients join on referral from the family doctor and leave with treatment or renege for other reasons (deconditioning, seeking private healthcare, etc). Adequate performance is commonly incentivised through the imposition of targets on waiting times. In the first study to do so, we develop a...

10
A Pharmacogenomic-Informed Representation Improves Multimodal EHR Survival Prediction
2026-01-30 health informatics 10.64898/2026.01.27.26344981
Top 0.4% (0.7%)
Show abstract

BackgroundElectronic health record (EHR)-based prognostic modeling is increasingly used in oncology, yet incorporating pharmacogenomic (PGx) knowledge derived from experimental systems into clinical prediction frameworks remains challenging. This gap is driven by fundamental mismatches between controlled drug-mutation assays and heterogeneous, incomplete real-world clinical data. MethodsWe propose a representation transfer framework that integrates PGx embeddings learned from large-scale in vit...

11
Act or Defer: Error-Controlled Decision Policies for Medical Foundation Models
2026-02-26 health informatics 10.64898/2026.02.23.26346927
Top 0.4% (0.7%)
Show abstract

Clinical deployment of foundation models requires decision policies that operate under explicit error budgets, such as a cap on false-positive clinical calls. Strong average accuracy alone does not guarantee safety: errors can concentrate among patients selected for action, leading to harm and inefficient use of healthcare resources. Here we introduce SO_SCPLOWTRATC_SCPLOWCP, a stratified conformal framework that turns foundation model predictions into decision-ready outputs through error-contro...

12
Enhancing Polygenic Risk Prediction by Modeling Quantile-Specific Genetic Effects
2025-12-29 epidemiology 10.64898/2025.12.25.25342935
Top 0.4% (0.7%)
Show abstract

Polygenic risk scores (PRSs) quantify an individuals genetic susceptibility to complex traits and diseases. Conventional PRSs, which are based on linear models, perform poorly for phenotypes with skewed distributions or with genetic effects that vary across the distribution. We propose a quantile regression-based PRS (QPRS) that can capture quantile-specific genetic effects. While existing PRSs provide only a single score, QPRS models genetic influences at multiple quantiles of the phenotype, th...

13
A variational sparse Gaussian-process method for detecting spatially variable genes and cellular interactions from spatial transcriptomics
2025-12-11 genetic and genomic medicine 10.64898/2025.12.10.25341956
Top 0.5% (0.7%)
Show abstract

Advanced spatially resolved transcriptomic (SRT) technologies preserve the spatial context of gene expression within tissues, enabling the study of context-dependent transcriptional regulation. Here, we propose VISGP, a variational sparse gaussian-process method for spatial variable genes (SVGs) and cellular interactions analysis from such data. VISGP utilizes variational inference and a sparse Gaussian process approximation, which efficiently models the posterior distribution with a set of indu...

14
A Governance-Driven, Real-World Data-Calibrated Health Informatics Framework for Longitudinal Utilization Forecasting in Oncology and Complex Chronic Conditions
2026-02-26 health informatics 10.64898/2026.02.23.26346919
Top 0.5% (0.7%)
Show abstract

BackgroundHealthcare utilization forecasting systems are often derived from static, annualized market share assumptions that fail to represent real-world treatment dynamics. Such approaches systematically misestimate future utilization by ignoring longitudinal treatment sequencing, discontinuation with surveillance, recurrence-driven re-entry, and provider adoption dynamics. ObjectiveThis study proposes a reusable, governance-driven health informatics forecasting framework designed to generate ...

15
Real-World Data for Predicting Rapid Relapse Triple Negative Cancer: A Study Using NCDB and EHR Data
2026-01-30 oncology 10.64898/2026.01.28.26345096
Top 0.5% (0.7%)
Show abstract

BackgroundMany patients with triple-negative breast cancer (TNBC), particularly those who are older, Black, or insured by Medicaid, do not receive guideline-concordant treatment, despite its association with up to 4x higher survival. Early identification of patients at risk for rapid relapse may enable timely interventions and improve outcomes. This study applies machine learning (ML) to real-world data to predict risk of rapid relapse in TNBC. MethodsWe trained various ML models (logistic regr...

16
Modelling serological multiplex bead assays responses: A case study from Malaysia.
2025-12-15 epidemiology 10.64898/2025.12.11.25342063
Top 0.5% (0.7%)
Show abstract

BackgroundMultiplex bead assays (MBAs) provide quantitative measurements of many analytes from small sample volumes, reducing cost and processing time compared with traditional immunoassays. These advantages have made MBAs valuable for studying diverse diseases, particularly in low-resource settings. However, most analytical approaches focus on individual diseases, while integrated surveillance platforms would benefit from methods that jointly analyse the full range of pathogens included in mult...

17
A Mendelian randomization-based drug repurposing pipeline
2026-03-02 epidemiology 10.64898/2026.02.28.26347341
Top 0.5% (0.7%)
Show abstract

Drug repurposing offers the opportunity to identify promising drug targets efficiently using existing data, but there are currently limitations to these efforts; there is a particular need for versatile, but rigorous high-throughput approaches. As such, we developed a flexible, high-throughput, Mendelian randomization (MR)-based drug repurposing pipeline with three stages: 1) MR-based identification, 2) MR-based validation and prioritization, and 3) application. This pipeline can be applied to a...

18
Predicting Salmonella Typhi incidence using prevalence metrics from sentinel studies of community-onset bloodstream infections
2026-02-15 public and global health 10.64898/2026.02.13.26346225
Top 0.6% (0.7%)
Show abstract

BackgroundTyphoid fever incidence estimates are central to policy decisions on vaccine introduction and investments in non-vaccine prevention and control but are often unavailable. We explored whether prevalence metrics from sentinel studies of community-onset bloodstream infections could accurately predict local Salmonella Typhi (S. Typhi) incidence. MethodsUsing a previous systematic review (January 2018-December 2024), we identified studies reporting both typhoid incidence and prevalence of ...

19
TASTE identifies shared proteomic effects on multiple related cancers
2025-12-29 epidemiology 10.64898/2025.12.19.25342717
Top 0.6% (0.5%)
Show abstract

IntroductionGenome-wide association studies (GWAS) have identified hundreds of variants linked to cancers, but their downstream regulatory consequences remain poorly understood. Increasing evidence suggests that related cancers share alterations of common regulatory programs. Trans-associations of cancer risk variants mediated via molecular phenotypes, such as gene expression and protein levels, can help uncover these downstream mechanisms. Further investigation of such convergence can reveal sh...

20
Accumulated Refusal Count: A Signal of Kidney Nonuse Risk
2026-01-11 transplantation 10.64898/2026.01.08.26343720
Top 0.7% (0.5%)
Show abstract

A substantial proportion of recovered deceased-donor (DD) kidneys go unused. Accumulated refusals by transplant centers during the offer process may signal nonuse risk, and quantifying this phenomenon could inform frameworks for rescue strategies or out-of-sequence (OOS) placement. Using OPTN data on adult DD kidneys offered for transplant in 2024, we empirically estimated the probability of nonuse as a function of accumulated refusal count (ARC). Kidneys transplanted OOS were excluded from anal...